Conversation
WalkthroughThis PR changes Redis/DSS bootstrap vs multi-node initialization, converts bucket-based mapping to per-slot (4096) addressing, simplifies DataStoreServiceClient/StartService APIs, renames GenerateMovedErrorMessage parameter, removes the USE_ONE_ELOQDSS_PARTITION_ENABLED CMake option and CI flags, and updates two submodule pointers. Changes
Sequence Diagram(s)sequenceDiagram
participant Redis as RedisServiceImpl::Init
participant Config as ClusterConfig
participant DS as DataStoreService
participant Client as DataStoreServiceClient
rect rgb(230,245,255)
Note over Redis,Config: Bootstrap path (bootstrap=true)
Redis->>Config: construct solo ng_configs (local node_id=0)
Config->>Config: derive ng_leaders from local config
Redis->>DS: StartService(bootstrap=true)
DS->>Client: ctor(bootstrap=true, peer_present=...)
end
rect rgb(245,230,255)
Note over Redis,Config: Non-bootstrap path (bootstrap=false)
Redis->>Config: ReadClusterConfigFile / ParseNgConfig
Config->>Config: build multi-node topology
Redis->>DS: StartService(bootstrap=false)
DS->>Client: ctor(bootstrap=false, peer_present=...)
end
sequenceDiagram
participant Cmd as ClusterKeySlotCommand
participant Hash as Hash/Slot calc
participant Err as GenerateMovedErrorMessage
Cmd->>Hash: key.Hash()
Hash->>Hash: bucket = hash & 0x3fff
Hash->>Hash: map bucket → per-slot slot_id(s) (0..4095)
Hash->>Err: GenerateMovedErrorMessage(slot_id)
Err->>Cmd: reply MOVED with slot_id
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/redis_service.cpp(2 hunks)store_handler(1 hunks)tx_service(1 hunks)
🔇 Additional comments (3)
store_handler (1)
1-1: Verify submodule commit and alignment with PR objectives.The submodule pointer is being updated as part of the multi-DSS shard support refactoring. Please confirm that:
- The new commit (776a8792d67354ea827560fa0aaa37cdabdf98f3) exists and is from the correct store_handler repository.
- The changes in that commit are compatible with the DSS topology initialization refactoring described in this PR (specifically the runtime-constructed ng_leaders mappings and bootstrap mode handling).
- Integration testing has validated that store_handler changes work correctly with the updated DataStoreService/DataStoreServiceClient constructors and the new topology initialization flow.
tx_service (1)
1-1: Verify submodule compatibility with multi-shard DSS topology changes.This is a submodule pointer update for
tx_service. Since the PR introduces changes to DataStoreService/DataStoreServiceClient constructors and moves topology initialization to runtime-driven ng_leaders mappings, please confirm that the newtx_servicecommit (c7e7581e402181f9ef2ed651c6017233288f6b41) is:
- Compatible with the updated constructor signatures
- Tested with the new multi-shard topology initialization logic
- A stable or well-tested commit (consider documenting any significant API changes or migration notes)
src/redis_service.cpp (1)
1039-1050: Verify behavior with emptyng_leadersin multi-node non-bootstrap scenarios.The
ng_leadersmap is only populated whenFLAGS_bootstrap || is_single_nodeis true (lines 1040-1046), but it's passed toTxConfigsToDssClusterConfigunconditionally (lines 1048-1049). This means in multi-node non-bootstrap scenarios, an emptyng_leadersmap is passed. Please confirm whether this is the intended behavior and thatTxConfigsToDssClusterConfigcorrectly handles empty leadership mappings for normal multi-node operation.
b996388 to
07d1760
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/redis_service.cpp(2 hunks)store_handler(1 hunks)tx_service(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- store_handler
- tx_service
🧰 Additional context used
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (2)
src/redis_service.cpp (2)
1083-1095: Follow up on past review: Verify the updated initialization logic is correct.The past review comment flagged unused variables
init_ng_leadersandbootstrap_shards. The current implementation now usesng_leaders(line 1039) and passes it toTxConfigsToDssClusterConfig(line 1050-1051), and the simplified constructor is used (lines 1091-1095).However, please verify:
- The
ng_leaderslogic correctly replaces the oldinit_ng_leadersinitialization- The simplified
StartServicecall (lines 1083-1084) matches the intended multi-shard support design- The
DataStoreServiceClientconstructor (lines 1091-1095) correctly usesds_configwhich was populated byTxConfigsToDssClusterConfigBased on past review comment and current implementation.
1033-1035: Good addition: SetThisNode after fetching config from peer.Calling
SetThisNodeto identify the local node within the fetched cluster configuration is a clean approach to integrate with an existing cluster.
07d1760 to
9eceec5
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/redis_service.cpp (1)
2172-2192: Guard against division-by-zero when ng_leaders is empty in GetNodeSlotsInfo.node_group_cnt = ng_leaders.size() can be 0; reserve(bucket_owners.size() / node_group_cnt + 1) then divides by zero. Add an early return (or treat as config error).
Apply this minimal fix:
@@ - std::unordered_map<NodeId, std::vector<uint16_t>> owner_to_bucket_ids; - uint32_t node_group_cnt = static_cast<uint32_t>(ng_leaders.size()); + std::unordered_map<NodeId, std::vector<uint16_t>> owner_to_bucket_ids; + uint32_t node_group_cnt = static_cast<uint32_t>(ng_leaders.size()); + if (node_group_cnt == 0) { + DLOG(WARNING) << "GetNodeSlotsInfo called with empty ng_leaders; no slot mapping produced."; + return; + } @@ - owner_to_bucket_ids.at(leader_node_id) - .reserve(bucket_owners.size() / node_group_cnt + 1); + owner_to_bucket_ids.at(leader_node_id) + .reserve(bucket_owners.size() / node_group_cnt + 1); @@ - owner_to_bucket_ids.at(leader_node_id) - .reserve(bucket_owners.size() / node_group_cnt + 1 + - current_size); + owner_to_bucket_ids.at(leader_node_id) + .reserve(bucket_owners.size() / node_group_cnt + 1 + + current_size);
🧹 Nitpick comments (1)
src/redis_service.cpp (1)
2211-2240: Avoid magic numbers and harden slot range construction.Replace 1024/16 literals with named constants and initialize end_slot_range for robustness (single-slot edge).
Apply this refactor:
- // map buckets to slots - assert(txservice::total_range_buckets == 1024); - std::vector<uint32_t> slot_ids; - // slots count is 16 times of bucket count - slot_ids.reserve(bucket_ids.size() * 16); - for (uint16_t bucket_id : bucket_ids) - { - for (int i = 0; i < 16; i++) - { - slot_ids.push_back(bucket_id + i * 1024); - } - } + // Map buckets to slots (Redis: 16384 slots, 1024 buckets → 16 slots/bucket) + constexpr uint32_t kBucketsPerRange = txservice::total_range_buckets; + constexpr uint32_t kSlotsPerBucket = 16; + assert(kBucketsPerRange == 1024); + std::vector<uint32_t> slot_ids; + slot_ids.reserve(bucket_ids.size() * kSlotsPerBucket); + for (uint16_t bucket_id : bucket_ids) { + for (uint32_t i = 0; i < kSlotsPerBucket; ++i) { + slot_ids.push_back(bucket_id + i * kBucketsPerRange); + } + } @@ - auto res = nodes_slots.try_emplace(node_id); - std::vector<SlotPair> &slots_vec = res.first->second; - SlotPair *slot_pair = &(slots_vec.emplace_back()); - slot_pair->start_slot_range = slot_ids.front(); + auto res = nodes_slots.try_emplace(node_id); + std::vector<SlotPair> &slots_vec = res.first->second; + SlotPair *slot_pair = &(slots_vec.emplace_back()); + slot_pair->start_slot_range = slot_ids.front(); + slot_pair->end_slot_range = slot_pair->start_slot_range; // initializeOptional: pre-sort bucket_ids once, then generate slot_ids in sorted order by iterating i then buckets to avoid sorting a 16× larger vector.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
CMakeLists.txt(0 hunks)concourse/scripts/build_tarball.bash(1 hunks)concourse/scripts/common.sh(1 hunks)src/redis_service.cpp(4 hunks)store_handler(1 hunks)tx_service(1 hunks)
💤 Files with no reviewable changes (1)
- CMakeLists.txt
✅ Files skipped from review due to trivial changes (1)
- tx_service
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-26T05:07:45.185Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: concourse/scripts/build_tarball_open.bash:121-121
Timestamp: 2025-09-26T05:07:45.185Z
Learning: The open log service in build_tarball_open.bash only supports ROCKSDB as the single log state option, unlike the main eloq log service which supports multiple WITH_LOG_STATE options. Therefore, no WITH_LOG_STATE configuration flag is needed for the open log service build.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T09:00:44.250Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:103-118
Timestamp: 2025-09-24T09:00:44.250Z
Learning: The build_log_service.cmake file is specifically for building the log_service submodule which only supports ROCKSDB log state, unlike build_eloq_log_service.cmake which handles multiple WITH_LOG_STATE options (MEMORY, ROCKSDB, ROCKSDB_CLOUD_S3, ROCKSDB_CLOUD_GCS). Therefore, requiring RocksDB unconditionally in build_log_service.cmake is the correct behavior.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:53:57.928Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: src/redis_service.cpp:94-118
Timestamp: 2025-09-24T08:53:57.928Z
Learning: The LOG_STATE_TYPE_RKDB_CLOUD macro gating in src/redis_service.cpp includes `&& !defined(LOG_STATE_TYPE_RKDB)` condition intentionally to prevent conflicts when LOG_STATE_TYPE_RKDB is defined unconditionally in some builds (build_log_service.cmake) alongside cloud variants. The CMake configuration in build_eloq_log_service.cmake ensures mutual exclusivity of log state types.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:45:08.818Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:128-135
Timestamp: 2025-09-24T08:45:08.818Z
Learning: The LOG_STATE_TYPE_RKDB compile definition in build_log_service.cmake is kept unconditional for compatibility reasons, despite architectural concerns about conflicting with WITH_LOG_STATE paradigm.
Applied to files:
concourse/scripts/common.sh
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (9)
store_handler (1)
1-1: Verify the submodule pointer update in your local environment.The sandbox environment couldn't access the store_handler submodule's commit history (both the old and new commits are not available in the remote). You'll need to manually verify locally that:
- The new commit hash (adda6517bec8363cd367580b6969c584229d39bd) is valid and accessible
- The changes align with the API signature updates (TxConfigsToDssClusterConfig, DataStoreServiceClient, DataStoreService::StartService) mentioned in this PR
- The submodule changes support the multi-shard DSS bootstrap logic refactoring
Run this in your local repository to inspect the changes:
cd store_handler git log --oneline ce0e9317fd486a0e5a1342228ca4f2805d492daf..adda6517bec8363cd367580b6969c584229d39bd git diff ce0e9317fd486a0e5a1342228ca4f2805d492daf..adda6517bec8363cd367580b6969c584229d39bd -- '*.h' '*.hpp'concourse/scripts/common.sh (2)
142-201: Correct deprecation of single-partition flag in run_build.Removing
-DUSE_ONE_ELOQDSS_PARTITION_ENABLED=ONaligns with the PR goal to support multi-shard DSS. The cmake invocation remains well-formed and the conditional DSS build logic (lines 190–197) is unaffected.
203-217: Verify consistency: run_build_ent may lack the same flag removal.The provided code for
run_build_ent(lines 209–217) does not show the~mark indicating removal of-DUSE_ONE_ELOQDSS_PARTITION_ENABLED=ON, unlikerun_build(line 154). The enriched summary states the flag was removed from both functions. Please confirm whether:
- The flag removal should also apply to
run_build_ent, or- The annotation is incomplete in the provided context.
If the removal should apply, ensure both functions have consistent cmake arguments to avoid build path divergence.
concourse/scripts/build_tarball.bash (1)
175-190: Correct deprecation of single-partition flag in DSS tarball build.Removing
-DUSE_ONE_ELOQDSS_PARTITION_ENABLED=OFFfrom the DSS cmake invocation is the correct counterpart to the removal incommon.sh. The conditional logic (lines 175–179) correctly gates DSS builds toELOQDSS_*types, and the simplified cmake command remains valid.src/redis_service.cpp (5)
1055-1057: LGTM: Correctly sets local DS endpoint when using peer topology.Calling SetThisNode after fetching from peer is the right place and order.
1114-1114: LGTM: StartService(true) for non-shared RocksDB.Matches the “always create_if_missing” intent for local Rocks deployments.
1116-1118: LGTM: StartService bootstrap flag unified across backends.Passing FLAGS_bootstrap || is_single_node is consistent with init semantics.
1126-1130: LGTM: DataStoreServiceClient signature update applied correctly.Constructor now takes bootstrap/single-node flag first; args order looks right.
1061-1074: Confirm whether non-bootstrap multi-node path requires ng_leaders population and ThisNode initialization.The code structure validates the review concerns:
- In non-bootstrap scenarios where
FLAGS_bootstrapandis_single_nodeare both false,ng_leadersremains empty when passed toTxConfigsToDssClusterConfig.ds_config.SetThisNode()is only called in the peer branch (lines 1055–1057); it is never invoked in the else branch whereTxConfigsToDssClusterConfigis called.- Line 2172 uses
ng_leaders.size()to computenode_group_cnt, suggesting non-empty leaders are expected.Verify whether
TxConfigsToDssClusterConfighandles emptyng_leadersgracefully, or whether this code path should populateng_leaderswith a safe default and explicitly setThisNodebefore the call.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/redis_service.cpp (1)
2162-2191: Division‑by‑zero if ng_leaders is empty.
node_group_cnt = ng_leaders.size()can be 0; subsequentreserve(bucket_owners.size() / node_group_cnt + 1)divides by zero. Add an early return (or error) whenng_leaders.empty().Minimal safe fix:
void RedisServiceImpl::GetNodeSlotsInfo( const std::unordered_map<NodeGroupId, NodeId> &ng_leaders, std::unordered_map<NodeId, std::vector<SlotPair>> &nodes_slots) const { LocalCcShards *local_cc_shards = txservice::Sharder::Instance().GetLocalCcShards(); auto native_ng = txservice::Sharder::Instance().NativeNodeGroup(); std::vector<std::pair<uint16_t, NodeGroupId>> bucket_owners = local_cc_shards->GetAllBucketOwners(native_ng); + if (ng_leaders.empty()) + { + // No leaders known → no slot ownership info can be computed. + nodes_slots.clear(); + return; + } + // split buckets_in_ng for all ngs' leader node and sort bucket_id for each // ng leader node std::unordered_map<NodeId, std::vector<uint16_t>> owner_to_bucket_ids; uint32_t node_group_cnt = static_cast<uint32_t>(ng_leaders.size());Optionally, also guard the denominator where
reserve(...)usesnode_group_cnt.
♻️ Duplicate comments (1)
src/redis_service.cpp (1)
1061-1074: Guard no‑peer, non‑bootstrap, multi‑node startup (ng_leaders empty).This path can call
TxConfigsToDssClusterConfigwith an emptyng_leadersmap when!FLAGS_bootstrap && !is_single_node, which was previously flagged. Add validation and fail fast (or supply a fallback) to avoid inconsistent DSS topology initialization.Suggested fix (fail fast with a clear message):
else { - std::unordered_map<uint32_t, uint32_t> ng_leaders; - if (FLAGS_bootstrap || is_single_node) + std::unordered_map<uint32_t, uint32_t> ng_leaders; + if (FLAGS_bootstrap || is_single_node) { // For bootstrap, start all data store shards in current // process. for (const auto &[ng_id, ng_config] : ng_configs) { ng_leaders.emplace(ng_id, node_id); } } EloqDS::DataStoreServiceClient::TxConfigsToDssClusterConfig( - node_id, ng_configs, ng_leaders, ds_config); + node_id, ng_configs, ng_leaders, ds_config); + + if (ng_leaders.empty() && !FLAGS_bootstrap && !is_single_node) + { + LOG(ERROR) << "No peer configured and not in bootstrap/single-node: " + << "unable to derive ng_leaders/topology. " + << "Set --eloq_dss_peer_node or run with --bootstrap."; + return false; + } }
🧹 Nitpick comments (1)
src/redis_service.cpp (1)
2210-2239: Slot mapping refactor is correct; remove magic numbers and set end initially.
- Use
txservice::total_range_bucketsinstead of literal1024.- Define
kSlotsPerBucketinstead of literal16.- Initialize
end_slot_rangetostart_slot_rangefor robustness.- // map buckets to slots - assert(txservice::total_range_buckets == 1024); - std::vector<uint32_t> slot_ids; - // slots count is 16 times of bucket count - slot_ids.reserve(bucket_ids.size() * 16); - for (uint16_t bucket_id : bucket_ids) - { - for (int i = 0; i < 16; i++) - { - slot_ids.push_back(bucket_id + i * 1024); - } - } + // map buckets to slots + constexpr uint32_t kSlotsPerBucket = 16; // 16384 / 1024 + static_assert(txservice::total_range_buckets == 1024, + "Update slot expansion if bucket count changes."); + std::vector<uint32_t> slot_ids; + slot_ids.reserve(bucket_ids.size() * kSlotsPerBucket); + for (uint16_t bucket_id : bucket_ids) + { + for (uint32_t i = 0; i < kSlotsPerBucket; ++i) + { + slot_ids.push_back(bucket_id + i * txservice::total_range_buckets); + } + } std::sort(slot_ids.begin(), slot_ids.end()); auto res = nodes_slots.try_emplace(node_id); std::vector<SlotPair> &slots_vec = res.first->second; SlotPair *slot_pair = &(slots_vec.emplace_back()); - slot_pair->start_slot_range = slot_ids.front(); + slot_pair->start_slot_range = slot_ids.front(); + slot_pair->end_slot_range = slot_ids.front(); for (size_t idx = 1; idx < slot_ids.size(); idx++) { if (slot_ids[idx - 1] + 1 != slot_ids[idx]) { slot_pair = &(slots_vec.emplace_back()); slot_pair->start_slot_range = slot_ids[idx]; } slot_pair->end_slot_range = slot_ids[idx]; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/redis_service.cpp(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (3)
src/redis_service.cpp (3)
1055-1057: Peer path self-identification looks correct.Setting
ds_config.SetThisNode(local_ip, TxPort2DssPort(local_tx_port))after fetching from peer is the right place and aligns with the revised bootstrap flow.
1113-1117: StartService() boolean usage is consistent with new API.
- RocksDB path:
trueto always create if missing (non-shared) — OK.- Other paths:
FLAGS_bootstrap || is_single_node— matches intent.
1126-1129: Cannot locateTxConfigsToDssClusterConfigimplementation to verify current node is set.The search confirmed two code paths exist:
- Peer path (line 1055): explicitly calls
ds_config.SetThisNode(...)- Non-peer path (line 1072–1073): calls
TxConfigsToDssClusterConfig(node_id, ng_configs, ng_leaders, ds_config)However, the implementation of
TxConfigsToDssClusterConfigcould not be found in the codebase search results. It may reside in an external library, header-only code, or be inaccessible to the search. Without verifying the implementation, I cannot definitively confirm whetherTxConfigsToDssClusterConfiginternally callsSetThisNodeon theds_configobject.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/redis_service.cpp (1)
2171-2191: Division-by-zero when ng_leaders is emptynode_group_cnt = ng_leaders.size() can be 0, and reserve(bucket_owners.size() / node_group_cnt + 1) will divide by zero. Guard early and return (or error) if ng_leaders is empty. This also aligns with earlier review feedback.
Apply this minimal fix:
@@ void RedisServiceImpl::GetNodeSlotsInfo( - std::unordered_map<NodeId, std::vector<uint16_t>> owner_to_bucket_ids; - uint32_t node_group_cnt = static_cast<uint32_t>(ng_leaders.size()); + std::unordered_map<NodeId, std::vector<uint16_t>> owner_to_bucket_ids; + uint32_t node_group_cnt = static_cast<uint32_t>(ng_leaders.size()); + if (node_group_cnt == 0) { + DLOG(WARNING) << "GetNodeSlotsInfo: ng_leaders is empty; no slot mapping produced."; + return; + }
🧹 Nitpick comments (1)
src/redis_service.cpp (1)
1061-1074: Clarify empty ng_leaders semantics in non-bootstrap, multi-node startupIn the no-peer path, ng_leaders stays empty unless bootstrap/single-node. Please confirm TxConfigsToDssClusterConfig can handle an empty leadership map and that nothing downstream assumes leaders exist at this point. If empty means “leadership unresolved,” consider logging an explicit INFO to aid ops.
Would you like a guard + log here if ng_leaders.empty() and !FLAGS_bootstrap && !is_single_node?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/redis_service.cpp(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (3)
src/redis_service.cpp (3)
1055-1058: SetThisNode after fetching peer topology — LGTMSetting ThisNode only after a successful FetchConfigFromPeer is correct and avoids stale self-identification.
1113-1117: StartService flags — LGTMAlways creating for RocksDB (true) and deferring to (FLAGS_bootstrap || is_single_node) for shared storage matches the comment and expected safety.
1126-1129: DataStoreServiceClient ctor update — LGTMPassing the bootstrap flag into the client aligns with the new API surface.
74530e8 to
fa38dd9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
CMakeLists.txt(0 hunks)concourse/scripts/build_tarball.bash(1 hunks)concourse/scripts/common.sh(3 hunks)src/redis_service.cpp(5 hunks)store_handler(1 hunks)tx_service(1 hunks)
💤 Files with no reviewable changes (1)
- CMakeLists.txt
✅ Files skipped from review due to trivial changes (1)
- tx_service
🚧 Files skipped from review as they are similar to previous changes (2)
- concourse/scripts/build_tarball.bash
- concourse/scripts/common.sh
🧰 Additional context used
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (4)
store_handler (1)
1-1: Unable to access submodule commits for verification; manual review required.The submodule commits (ce0e9317...→7a9459481...) cannot be inspected in this environment due to remote access restrictions. To verify the submodule changes:
- Clone the repository locally and run:
cd store_handler && git log ce0e9317fd486a0e5a1342228ca4f2805d492daf..7a9459481634fc344ac7b2c63e4156b0e22f187a --statto see what changed- Check the submodule's commit history for breaking API changes (especially headers/interfaces)
- Confirm the changes align with the PR's "multi-dss shard" support goals
- Ensure any API updates mentioned in the parent PR are reflected in the submodule
src/redis_service.cpp (3)
1192-1197: Updated DataStoreServiceClient ctor usage looks correctConstructor args align with the new signature and bootstrap/single-node flag. LGTM.
2277-2306: Slot range assembly: single‑slot case now initialized correctlyInitializing end_slot_range alongside start_slot_range fixes the previously reported edge case. The range stitching loop is correct. LGTM.
1066-1069: Concern cannot be fully verified from codebase analysis aloneThe review identifies two initialization paths that are handled differently:
- Peer path (line 1056-1069):
FetchConfigFromPeer()followed by explicitSetThisNode()- Non-peer path (lines 1070-1141):
TxConfigsToDssClusterConfig()called in both bootstrap (line 1125) and non-bootstrap (line 1138) branchesBoth non-peer paths correctly derive the node_id (either
tmp_node_idornode_id) by matching againstlocal_ipandlocal_tx_port. However, the implementation ofTxConfigsToDssClusterConfig()is not accessible in the repository, making it impossible to confirm whether it properly initializes the this-node metadata inDataStoreServiceClusterManageras the peer path does viaSetThisNode().
9a4ccf5 to
14981a1
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/redis_service.cpp (2)
2242-2261: Critical: division-by-zero when ng_leaders is empty.node_group_cnt = ng_leaders.size() can be 0; subsequent reserve(.../node_group_cnt) divides by zero and is UB. Guard the empty case.
void RedisServiceImpl::GetNodeSlotsInfo( const std::unordered_map<NodeGroupId, NodeId> &ng_leaders, std::unordered_map<NodeId, std::vector<SlotPair>> &nodes_slots) const { @@ - uint32_t node_group_cnt = static_cast<uint32_t>(ng_leaders.size()); + uint32_t node_group_cnt = static_cast<uint32_t>(ng_leaders.size()); + if (node_group_cnt == 0) { + // Nothing to assign; avoid UB on division by zero. + return; + } @@ .reserve(bucket_owners.size() / node_group_cnt + 1); @@ .reserve(bucket_owners.size() / node_group_cnt + 1 + current_size);#!/bin/bash # Expect: no occurrences of division by zero introduced by size() usage hereafter. rg -n -C2 'bucket_owners\.size\(\)\s*/\s*node_group_cnt' src/redis_service.cpp
1102-1118: Bug: Iterating the wrong container when searching tmp_node_id in bootstrap path.This loop should scan tmp_ng_configs (freshly loaded) but iterates ng_configs. It can fail to find the local node or pick a wrong node_id.
Apply fix:
- bool found = false; - uint32_t tmp_node_id = 0; - for (auto &pair : ng_configs) + bool found = false; + uint32_t tmp_node_id = 0; + for (const auto &pair : tmp_ng_configs) { auto &ng_nodes = pair.second; for (size_t i = 0; i < ng_nodes.size(); i++) { if (ng_nodes[i].host_name_ == local_ip && ng_nodes[i].port_ == local_tx_port) { tmp_node_id = ng_nodes[i].node_id_; found = true; - break; + break; } } + if (found) break; }
🧹 Nitpick comments (1)
src/redis_service.cpp (1)
2281-2310: Optional: avoid 16× expansion and sort; emit ranges layer-by-layer.Current approach builds a 16N vector and sorts it. You can sort bucket_ids once and emit contiguous slot ranges per “layer” (i in [0..15]) for O(N log N + 16N) time and O(N) memory.
Illustrative replacement for the mapping/range build:
- // map buckets to slots - assert(txservice::total_range_buckets == 1024); - std::vector<uint32_t> slot_ids; - // slots count is 16 times of bucket count - slot_ids.reserve(bucket_ids.size() * 16); - for (uint16_t bucket_id : bucket_ids) - { - for (int i = 0; i < 16; i++) - { - slot_ids.push_back(bucket_id + i * 1024); - } - } - std::sort(slot_ids.begin(), slot_ids.end()); - - auto res = nodes_slots.try_emplace(node_id); - std::vector<SlotPair> &slots_vec = res.first->second; - SlotPair *slot_pair = &(slots_vec.emplace_back()); - slot_pair->start_slot_range = slot_ids.front(); - slot_pair->end_slot_range = slot_ids.front(); - for (size_t idx = 1; idx < slot_ids.size(); idx++) - { - if (slot_ids[idx - 1] + 1 != slot_ids[idx]) - { - slot_pair = &(slots_vec.emplace_back()); - slot_pair->start_slot_range = slot_ids[idx]; - } - slot_pair->end_slot_range = slot_ids[idx]; - } + std::sort(bucket_ids.begin(), bucket_ids.end()); + auto res = nodes_slots.try_emplace(node_id); + std::vector<SlotPair> &slots_vec = res.first->second; + // Each bucket maps to 16 slots: bucket + i*1024 + for (int layer = 0; layer < 16; ++layer) { + size_t start_idx = 0; + while (start_idx < bucket_ids.size()) { + uint32_t range_start = bucket_ids[start_idx] + layer * 1024; + uint32_t range_end = range_start; + size_t end_idx = start_idx; + while (end_idx + 1 < bucket_ids.size() + && bucket_ids[end_idx + 1] == bucket_ids[end_idx] + 1) { + ++end_idx; + range_end = bucket_ids[end_idx] + layer * 1024; + } + SlotPair &sp = slots_vec.emplace_back(); + sp.start_slot_range = range_start; + sp.end_slot_range = range_end; + start_idx = end_idx + 1; + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
CMakeLists.txt(0 hunks)concourse/scripts/build_tarball.bash(1 hunks)concourse/scripts/common.sh(3 hunks)src/redis_service.cpp(5 hunks)store_handler(1 hunks)tx_service(1 hunks)
💤 Files with no reviewable changes (1)
- CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (3)
- concourse/scripts/build_tarball.bash
- store_handler
- tx_service
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-26T05:07:45.185Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: concourse/scripts/build_tarball_open.bash:121-121
Timestamp: 2025-09-26T05:07:45.185Z
Learning: The open log service in build_tarball_open.bash only supports ROCKSDB as the single log state option, unlike the main eloq log service which supports multiple WITH_LOG_STATE options. Therefore, no WITH_LOG_STATE configuration flag is needed for the open log service build.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T09:00:44.250Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:103-118
Timestamp: 2025-09-24T09:00:44.250Z
Learning: The build_log_service.cmake file is specifically for building the log_service submodule which only supports ROCKSDB log state, unlike build_eloq_log_service.cmake which handles multiple WITH_LOG_STATE options (MEMORY, ROCKSDB, ROCKSDB_CLOUD_S3, ROCKSDB_CLOUD_GCS). Therefore, requiring RocksDB unconditionally in build_log_service.cmake is the correct behavior.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:53:57.928Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: src/redis_service.cpp:94-118
Timestamp: 2025-09-24T08:53:57.928Z
Learning: The LOG_STATE_TYPE_RKDB_CLOUD macro gating in src/redis_service.cpp includes `&& !defined(LOG_STATE_TYPE_RKDB)` condition intentionally to prevent conflicts when LOG_STATE_TYPE_RKDB is defined unconditionally in some builds (build_log_service.cmake) alongside cloud variants. The CMake configuration in build_eloq_log_service.cmake ensures mutual exclusivity of log state types.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:45:08.818Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:128-135
Timestamp: 2025-09-24T08:45:08.818Z
Learning: The LOG_STATE_TYPE_RKDB compile definition in build_log_service.cmake is kept unconditional for compatibility reasons, despite architectural concerns about conflicting with WITH_LOG_STATE paradigm.
Applied to files:
concourse/scripts/common.sh
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (9)
concourse/scripts/common.sh (3)
148-154: Remove the deprecated cmake flag fromrun_build.The removal of
-DUSE_ONE_ELOQDSS_PARTITION_ENABLED=ONaligns with the PR objective to support multi-dss-shard configurations. This change correctly removes the single-partition constraint. However, verify that all downstream code paths properly handle multi-shard initialization without this flag.
209-217: Verify the cmake flag removal inrun_build_ent.The
-DUSE_ONE_ELOQDSS_PARTITION_ENABLED=ONflag removal is consistent with the changes inrun_build. Confirm that the enterprise build path correctly handles multi-shard DSS topology without requiring this flag.
2970-2970: Clarify why these tests are commented out.Two test invocations have been commented out in the
run_eloq_testfunction for ELOQDSS_ROCKSDB_CLOUD_S3:
- Line 2970:
branching_snapshot_test.py- Line 2979:
rocksdb_cloud_data_store_purger_test.pyAre these tests temporarily disabled pending fixes, or should they be removed entirely? If intentional, consider adding a comment explaining the reason (e.g., "Disabled until multi-shard snapshot logic is finalized").
Also applies to: 2979-2979
src/redis_service.cpp (6)
838-845: Bootstrap solo-node init looks correct.Creating a single NG with the local node for FLAGS_bootstrap is consistent with the new bootstrap flow.
Also applies to: 846-865
1071-1073: Good: SetThisNode after fetching DSS topology.Setting this node after FetchConfigFromPeer prevents stale identity; Tx→DSS port mapping is used correctly.
1184-1188: RocksDB path: StartService(true) matches new semantics.Always creating DB for local storage is expected. LGTM.
1197-1200: Client ctor updated to pass bootstrap_flag: OK.Constructor call matches the new signature and intent.
1212-1219: Connect gating is correct.Connecting only in bootstrap/single-node avoids premature connects in clustered startup.
1076-1131: Bootstrap DSS config assembly overall is sound after the container fix.Building tmp_ng_configs, resolving tmp_node_id, and setting ng_leaders per-NG for DSS bootstrap aligns with the multi-shard design.
Please confirm TxConfigsToDssClusterConfig gracefully handles empty ng_leaders in the non-single-node, non-bootstrap path (it’s invoked below with possibly empty ng_leaders).
Also applies to: 1133-1144
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
concourse/scripts/common.sh (1)
142-154: Flag removal verified; add tracking comments for disabled tests.The
-DUSE_ONE_ELOQDSS_PARTITION_ENABLEDflag has been completely and consistently removed from bothrun_build()andrun_build_ent()functions. However, multiple test cases in therun_eloq_testfunction (lines 2970–2979 area) are commented out without explanation or tracking:
branching_snapshot_testrocksdb_cloud_data_store_purger_test- ttl tests
Add
TODOcomments or issue references to these disabled tests explaining whether they are known failures under the multi-shard model or technical debt to be re-enabled.src/redis_service.cpp (1)
646-651: Typo in config validation log message.Message mentions enable_io_uring twice; should reference raft_log_async_fsync.
- LOG(ERROR) << "Invalid config: when set `enable_io_uring`, " - "should also set `enable_io_uring`."; + LOG(ERROR) << "Invalid config: when set `raft_log_async_fsync`, " + "should also set `enable_io_uring`.";
♻️ Duplicate comments (1)
src/redis_service.cpp (1)
1102-1118: Break outer loop once tmp_node_id is found (tiny efficiency nit).Stop scanning tmp_ng_configs after the match; adds no behavior change, reduces work.
for (const auto &pair : tmp_ng_configs) { auto &ng_nodes = pair.second; for (size_t i = 0; i < ng_nodes.size(); i++) { if (ng_nodes[i].host_name_ == local_ip && ng_nodes[i].port_ == local_tx_port) { tmp_node_id = ng_nodes[i].node_id_; found = true; break; } } + if (found) break; }
🧹 Nitpick comments (1)
src/redis_service.cpp (1)
2281-2311: Per-slot expansion and contiguous-range emission looks correct.
- Properly initializes start/end; sorted slots guarantee correct range stitching.
- Optional: replace magic constants with named constants for readability.
- for (int i = 0; i < 16; i++) + constexpr int kSlotsPerBucket = 16; + constexpr int kBucketsPerLayer = 1024; + for (int i = 0; i < kSlotsPerBucket; i++) { - slot_ids.push_back(bucket_id + i * 1024); + slot_ids.push_back(bucket_id + i * kBucketsPerLayer); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
CMakeLists.txt(0 hunks)concourse/scripts/build_tarball.bash(1 hunks)concourse/scripts/common.sh(3 hunks)src/redis_service.cpp(5 hunks)store_handler(1 hunks)tx_service(1 hunks)
💤 Files with no reviewable changes (1)
- CMakeLists.txt
✅ Files skipped from review due to trivial changes (1)
- tx_service
🚧 Files skipped from review as they are similar to previous changes (1)
- concourse/scripts/build_tarball.bash
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-26T05:07:45.185Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: concourse/scripts/build_tarball_open.bash:121-121
Timestamp: 2025-09-26T05:07:45.185Z
Learning: The open log service in build_tarball_open.bash only supports ROCKSDB as the single log state option, unlike the main eloq log service which supports multiple WITH_LOG_STATE options. Therefore, no WITH_LOG_STATE configuration flag is needed for the open log service build.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T09:00:44.250Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:103-118
Timestamp: 2025-09-24T09:00:44.250Z
Learning: The build_log_service.cmake file is specifically for building the log_service submodule which only supports ROCKSDB log state, unlike build_eloq_log_service.cmake which handles multiple WITH_LOG_STATE options (MEMORY, ROCKSDB, ROCKSDB_CLOUD_S3, ROCKSDB_CLOUD_GCS). Therefore, requiring RocksDB unconditionally in build_log_service.cmake is the correct behavior.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:53:57.928Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: src/redis_service.cpp:94-118
Timestamp: 2025-09-24T08:53:57.928Z
Learning: The LOG_STATE_TYPE_RKDB_CLOUD macro gating in src/redis_service.cpp includes `&& !defined(LOG_STATE_TYPE_RKDB)` condition intentionally to prevent conflicts when LOG_STATE_TYPE_RKDB is defined unconditionally in some builds (build_log_service.cmake) alongside cloud variants. The CMake configuration in build_eloq_log_service.cmake ensures mutual exclusivity of log state types.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:45:08.818Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:128-135
Timestamp: 2025-09-24T08:45:08.818Z
Learning: The LOG_STATE_TYPE_RKDB compile definition in build_log_service.cmake is kept unconditional for compatibility reasons, despite architectural concerns about conflicting with WITH_LOG_STATE paradigm.
Applied to files:
concourse/scripts/common.sh
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (8)
store_handler (1)
1-1: Verify submodule integration with updated DataStoreServiceClient API via integration tests.The sandbox environment cannot access the submodule's specific commit. Please manually verify that the new
store_handlercommit19ea736acorrectly integrates with these API changes:
DataStoreServiceClientconstructor requiresbootstrap_flagas first parameterTxConfigsToDssClusterConfigsignature changed to node_id-centric formStartServicesignature simplified toStartService(bool)Confirm integration tests validate the bootstrap flow across the updated APIs.
concourse/scripts/common.sh (1)
203-217: The review comment's stated concern appears to be moot—the-DUSE_ONE_ELOQDSS_PARTITION_ENABLEDflag is already absent from bothrun_build()andrun_build_ent()functions.Verification results:
- run_build() cmake:
-DCMAKE_BUILD_TYPE,-DWITH_DATA_STORE,-DBUILD_WITH_TESTS=ON,-DWITH_LOG_SERVICE=ON- run_build_ent() cmake: Same as above, plus
-DOPEN_LOG_SERVICE=OFFand-DFORK_HM_PROCESS=ONThe flag referenced in the review comment does not exist in either function, so there is no inconsistency to resolve. Note that
run_build_ent()intentionally has two additional configuration flags (-DOPEN_LOG_SERVICE=OFF,-DFORK_HM_PROCESS=ON) that differentiate it fromrun_build(), which appears to be by design.Likely an incorrect or invalid review comment.
src/redis_service.cpp (6)
838-845: Bootstrap solo-node initialization looks correct.Solo ng_configs seeded with current node; clean fallback to file/flags otherwise.
Also applies to: 846-865
1070-1073: Good: SetThisNode after fetching peer topology.This ordering avoids stale self-node metadata.
1124-1144: Bootstrap and single-node ng_leaders derivation is reasonable.Assigning this node as leader in bootstrap/single-node fits the intended semantics.
1184-1188: StartService gating is appropriate.RocksDB non-shared always true; others respect bootstrap||single-node.
1197-1200: Client constructor updated with bootstrap flag — OK.Matches new API and call-site semantics.
1212-1219: Deferred Connect unless bootstrap/single-node — good.Avoids premature connections before role/topology is known.
79e2b47 to
cd22760
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/redis_service.cpp (1)
2285-2314: Avoid 16× expansion/sort for slots; emit ranges per layer directlyCurrent code builds and sorts a 16× larger vector per node. You can sort bucket_ids once and emit contiguous slot ranges per layer to cut allocations and O(16N log 16N) sorting.
Example replacement:
- // map buckets to slots - assert(txservice::total_range_buckets == 1024); - std::vector<uint32_t> slot_ids; - slot_ids.reserve(bucket_ids.size() * 16); - for (uint16_t bucket_id : bucket_ids) { - for (int i = 0; i < 16; i++) { - slot_ids.push_back(bucket_id + i * 1024); - } - } - std::sort(slot_ids.begin(), slot_ids.end()); - auto res = nodes_slots.try_emplace(node_id); - std::vector<SlotPair> &slots_vec = res.first->second; - SlotPair *slot_pair = &(slots_vec.emplace_back()); - slot_pair->start_slot_range = slot_ids.front(); - slot_pair->end_slot_range = slot_ids.front(); - for (size_t idx = 1; idx < slot_ids.size(); idx++) { - if (slot_ids[idx - 1] + 1 != slot_ids[idx]) { - slot_pair = &(slots_vec.emplace_back()); - slot_pair->start_slot_range = slot_ids[idx]; - } - slot_pair->end_slot_range = slot_ids[idx]; - } + assert(txservice::total_range_buckets == 1024); + std::sort(bucket_ids.begin(), bucket_ids.end()); + auto res = nodes_slots.try_emplace(node_id); + std::vector<SlotPair> &slots_vec = res.first->second; + for (int layer = 0; layer < 16; ++layer) { + size_t i = 0; + while (i < bucket_ids.size()) { + uint32_t start = bucket_ids[i] + layer * 1024; + uint32_t end = start; + size_t j = i; + while (j + 1 < bucket_ids.size() && + bucket_ids[j + 1] == bucket_ids[j] + 1) { + ++j; + end = bucket_ids[j] + layer * 1024; + } + SlotPair &sp = slots_vec.emplace_back(); + sp.start_slot_range = start; + sp.end_slot_range = end; + i = j + 1; + } + }Optional: early‑return if ng_leaders.empty() to skip all work.
🧹 Nitpick comments (2)
include/redis_service.h (1)
242-242: LGTM – Parameter rename improves semantic clarity.Renaming the parameter from
slot_numtoslot_idbetter conveys that it represents a slot identifier rather than a count, and aligns with the implementation in src/redis_service.cpp and the broader per-slot expansion in this PR.src/redis_service.cpp (1)
2588-2612: Replace magic mask 0x3fff with a named constantUse a constexpr (e.g., kRedisClusterSlotMask = 0x3fff) to avoid literals in two places and improve readability.
- uint16_t slot_id = key->Hash() & 0x3fff; + constexpr uint16_t kRedisClusterSlotMask = 0x3fff; // 16383 + uint16_t slot_id = key->Hash() & kRedisClusterSlotMask; @@ - uint16_t slot_id = keys->at(0).Hash() & 0x3fff; + constexpr uint16_t kRedisClusterSlotMask = 0x3fff; // 16383 + uint16_t slot_id = keys->at(0).Hash() & kRedisClusterSlotMask;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CMakeLists.txt(0 hunks)concourse/scripts/build_tarball.bash(1 hunks)concourse/scripts/common.sh(3 hunks)include/redis_service.h(1 hunks)src/redis_command.cpp(1 hunks)src/redis_service.cpp(8 hunks)store_handler(1 hunks)tx_service(1 hunks)
💤 Files with no reviewable changes (1)
- CMakeLists.txt
✅ Files skipped from review due to trivial changes (2)
- src/redis_command.cpp
- tx_service
🚧 Files skipped from review as they are similar to previous changes (1)
- concourse/scripts/build_tarball.bash
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-26T05:07:45.185Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: concourse/scripts/build_tarball_open.bash:121-121
Timestamp: 2025-09-26T05:07:45.185Z
Learning: The open log service in build_tarball_open.bash only supports ROCKSDB as the single log state option, unlike the main eloq log service which supports multiple WITH_LOG_STATE options. Therefore, no WITH_LOG_STATE configuration flag is needed for the open log service build.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T09:00:44.250Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:103-118
Timestamp: 2025-09-24T09:00:44.250Z
Learning: The build_log_service.cmake file is specifically for building the log_service submodule which only supports ROCKSDB log state, unlike build_eloq_log_service.cmake which handles multiple WITH_LOG_STATE options (MEMORY, ROCKSDB, ROCKSDB_CLOUD_S3, ROCKSDB_CLOUD_GCS). Therefore, requiring RocksDB unconditionally in build_log_service.cmake is the correct behavior.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:53:57.928Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: src/redis_service.cpp:94-118
Timestamp: 2025-09-24T08:53:57.928Z
Learning: The LOG_STATE_TYPE_RKDB_CLOUD macro gating in src/redis_service.cpp includes `&& !defined(LOG_STATE_TYPE_RKDB)` condition intentionally to prevent conflicts when LOG_STATE_TYPE_RKDB is defined unconditionally in some builds (build_log_service.cmake) alongside cloud variants. The CMake configuration in build_eloq_log_service.cmake ensures mutual exclusivity of log state types.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:45:08.818Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:128-135
Timestamp: 2025-09-24T08:45:08.818Z
Learning: The LOG_STATE_TYPE_RKDB compile definition in build_log_service.cmake is kept unconditional for compatibility reasons, despite architectural concerns about conflicting with WITH_LOG_STATE paradigm.
Applied to files:
concourse/scripts/common.sh
🧬 Code graph analysis (2)
include/redis_service.h (1)
src/redis_service.cpp (2)
GenerateMovedErrorMessage(2501-2525)GenerateMovedErrorMessage(2501-2501)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (6)
store_handler (1)
1-1: Submodule commit hash cannot be verified in this environment—manual verification required.The submodule pointer update in
store_handlercannot be verified because the old commit hash (f03dd0d1901e97450d32dd7adf7858e7673af387) is not accessible in the submodule repository. This could indicate the commit was rebased, removed from history, or the submodule has limited history available.Ensure that:
- The new commit hash
19ea736ac39b384d9ba0bf7c2af1fc04a2fc436aexists and contains the intended multi-shard DSS changes- The submodule changes are compatible with updates in the main repository files
- All related API signature changes are properly integrated
concourse/scripts/common.sh (1)
154-154: LGTM – Removal of deprecated partition flag aligns with multi-shard architecture.The removal of
-DUSE_ONE_ELOQDSS_PARTITION_ENABLED=ONis consistent with the PR's architectural changes supporting multi-DSS shards and the removal of this option from CMakeLists.txt.src/redis_service.cpp (4)
838-845: Solo‑node bootstrap path looks goodUsing a single NG with only the local candidate simplifies bootstrap and avoids external config dependencies. No changes requested.
1060-1149: Ensure SetThisNode is applied when building ds_config from ng_configs (non‑peer path)You call SetThisNode() only when config is fetched from a peer. For the local TxConfigsToDssClusterConfig path, please confirm it sets "this node" inside ds_config (IP + DSS port) or add an explicit SetThisNode() after constructing ds_config.
If needed, add after TxConfigsToDssClusterConfig:
@@ - EloqDS::DataStoreServiceClient::TxConfigsToDssClusterConfig( - node_id, ng_configs, ng_leaders, ds_config); + EloqDS::DataStoreServiceClient::TxConfigsToDssClusterConfig( + node_id, ng_configs, ng_leaders, ds_config); + ds_config.SetThisNode( + local_ip, + EloqDS::DataStoreServiceClient::TxPort2DssPort(local_tx_port));
1188-1192: Double‑check StartService flags across backendsRocksDB path forces create_if_missing (true) — good for non‑shared storage. For other backends you use FLAGS_bootstrap || is_single_node. Please verify that StartService(false) in multi‑node, non‑bootstrap is the intended behavior (e.g., no shard creation, only service start), and that bootstrap/single‑node creates as expected.
2501-2512: LGTM: MOVED error now uses slot_id and per‑slot rangesSignature and comparisons align with 0–16383 slot semantics. No changes requested.
fd9d2b0 to
4f9b323
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/redis_service.cpp (1)
2247-2266: Guard against division-by-zero when ng_leaders is empty.node_group_cnt can be 0, leading to reserve(bucket_owners.size() / node_group_cnt + 1). Add an early return.
Apply this diff:
@@ void RedisServiceImpl::GetNodeSlotsInfo( - uint32_t node_group_cnt = static_cast<uint32_t>(ng_leaders.size()); + uint32_t node_group_cnt = static_cast<uint32_t>(ng_leaders.size()); + if (node_group_cnt == 0) { + DLOG(WARNING) << "GetNodeSlotsInfo called with empty ng_leaders"; + return; + }
♻️ Duplicate comments (2)
concourse/scripts/common.sh (1)
2970-2970: Add explanatory comments for deliberately commented-out tests.Lines 2970 and 2979 comment out
branching_snapshot_testandrocksdb_cloud_data_store_purger_testwithout inline explanation. As flagged in previous reviews, please clarify whether these tests are:
- Intentionally disabled due to incompatibility with multi-shard changes — if so, add a comment explaining why (e.g., "Disabled due to multi-shard slot mapping changes; re-enable when shards fully supported") and reference this PR.
- Temporarily disabled pending a fix — if so, add a TODO comment with a link to a tracking issue and expected re-enable timeline.
Without explanation, it's unclear whether these are known gaps or accidental omissions.
Also applies to: 2979-2979
src/redis_service.cpp (1)
1102-1122: Correct container used for tmp_node_id lookup (LGTM).Switched to iterate tmp_ng_configs; breaks correctly once found.
🧹 Nitpick comments (1)
src/redis_service.cpp (1)
2286-2315: Per-slot mapping logic is correct; consider avoiding 16× expansion.Current approach builds and sorts 16N slot_ids. You can emit ranges per 16 layers over sorted bucket_ids to reduce allocations and sort cost. I can provide a ready-to-drop helper if you want it.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt(0 hunks)concourse/scripts/build_tarball.bash(1 hunks)concourse/scripts/common.sh(3 hunks)include/redis_service.h(1 hunks)src/redis_command.cpp(1 hunks)src/redis_service.cpp(8 hunks)store_handler(1 hunks)tests/unit/mono/expire.tcl(1 hunks)tx_service(1 hunks)
💤 Files with no reviewable changes (1)
- CMakeLists.txt
✅ Files skipped from review due to trivial changes (1)
- store_handler
🚧 Files skipped from review as they are similar to previous changes (3)
- concourse/scripts/build_tarball.bash
- tx_service
- src/redis_command.cpp
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-26T05:07:45.185Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: concourse/scripts/build_tarball_open.bash:121-121
Timestamp: 2025-09-26T05:07:45.185Z
Learning: The open log service in build_tarball_open.bash only supports ROCKSDB as the single log state option, unlike the main eloq log service which supports multiple WITH_LOG_STATE options. Therefore, no WITH_LOG_STATE configuration flag is needed for the open log service build.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T09:00:44.250Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:103-118
Timestamp: 2025-09-24T09:00:44.250Z
Learning: The build_log_service.cmake file is specifically for building the log_service submodule which only supports ROCKSDB log state, unlike build_eloq_log_service.cmake which handles multiple WITH_LOG_STATE options (MEMORY, ROCKSDB, ROCKSDB_CLOUD_S3, ROCKSDB_CLOUD_GCS). Therefore, requiring RocksDB unconditionally in build_log_service.cmake is the correct behavior.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:53:57.928Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: src/redis_service.cpp:94-118
Timestamp: 2025-09-24T08:53:57.928Z
Learning: The LOG_STATE_TYPE_RKDB_CLOUD macro gating in src/redis_service.cpp includes `&& !defined(LOG_STATE_TYPE_RKDB)` condition intentionally to prevent conflicts when LOG_STATE_TYPE_RKDB is defined unconditionally in some builds (build_log_service.cmake) alongside cloud variants. The CMake configuration in build_eloq_log_service.cmake ensures mutual exclusivity of log state types.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:45:08.818Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:128-135
Timestamp: 2025-09-24T08:45:08.818Z
Learning: The LOG_STATE_TYPE_RKDB compile definition in build_log_service.cmake is kept unconditional for compatibility reasons, despite architectural concerns about conflicting with WITH_LOG_STATE paradigm.
Applied to files:
concourse/scripts/common.sh
🧬 Code graph analysis (2)
include/redis_service.h (1)
src/redis_service.cpp (2)
GenerateMovedErrorMessage(2502-2526)GenerateMovedErrorMessage(2502-2502)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (10)
concourse/scripts/common.sh (2)
148-154: CMake flag removal aligns with multi-shard changes.The removal of
-DUSE_ONE_ELOQDSS_PARTITION_ENABLED=ONfrom the cmake invocation is consistent with the PR objectives to support multi-DSS sharding.
209-217: CMake flag removal in enterprise build is consistent.The removal of
-DUSE_ONE_ELOQDSS_PARTITION_ENABLED=ONfrom therun_build_entcmake invocation mirrors the change inrun_build, maintaining consistency across build configurations.tests/unit/mono/expire.tcl (1)
156-160: Approve the relaxation, but consider verifying clock skew tolerance.The change appropriately addresses clock skew in the multi-shard environment. When SETEX calculates expiration on one node and PTTL reads from another, a 100ms clock difference can cause PTTL to return up to 1100ms for a 1-second TTL.
However, there are a few concerns:
Grammar: Fix the comment text:
- Line 157: "Why ttl <= 1100 ?" (to match the assertion)
- Line 157: "calculates" instead of "calculate" (twice on lines 157-158)
- Line 158: "calculates" instead of "calculate"
Clock skew magnitude: 100ms clock skew between nodes is significant. In production deployments with NTP, clock skew should typically be within a few milliseconds. Consider verifying whether this level of clock drift is expected in your deployment or if it indicates a clock synchronization issue.
TODO tracking: The TODO on line 159 mentions unifying the calculation node. Consider tracking this as a follow-up task.
Apply this diff to fix the grammar:
- # Why ttl < 1100 ? Because, SETEX calculate the expired timestamp on the node that client connected - # and PTTL calculate the expired timestamp on the node that key located. - # The clock on the two nodes is not same. (TODO: the two command should calculate on same node) + # Why ttl <= 1100? Because SETEX calculates the expiration timestamp on the node that the client connected to, + # and PTTL calculates the expiration timestamp on the node where the key is located. + # The clocks on the two nodes may not be synchronized. (TODO: both commands should calculate on the same node)Do you want me to verify the expected clock synchronization requirements for your deployment, or help track the TODO item?
include/redis_service.h (1)
242-242: Param rename aligns with per-slot semantics (LGTM).Declaration matches implementation; consistent with slot_id usage.
src/redis_service.cpp (6)
838-865: Bootstrap solo-node path: config assembly looks correct.Single NG with local candidate is fine for bootstrapping.
1070-1073: SetThisNode uses TxPort2DssPort — please confirm port mapping.Ensure TxPort2DssPort(local_tx_port) yields the DSS RPC port expected by peers.
Would you confirm against the DSS networking docs?
1128-1148: Bootstrap ng_leaders synthesized to local node — confirm multi-NG intent.Making this node leader for all NGs during bootstrap is reasonable; confirm no side effects when peers later join and elect leaders.
1188-1192: API changes: verify StartService(bool) semantics and client bootstrap flag.
- StartService(true) for RocksDB and (FLAGS_bootstrap || is_single_node) elsewhere: confirm the bool means “create if missing / bootstrap mode.”
- DataStoreServiceClient(bootstrap_flag, …): confirm the flag drives lazy connect and topology init as intended when not bootstrap.
Also applies to: 1201-1206
2502-2526: MOVED error now keyed by slot_id (LGTM).Range check and leader endpoint selection are correct; UNKNOWN fallback is acceptable.
2589-2613: slot_id = hash & 0x3fff matches 16384-slot cluster.Both single- and multi-key paths compute consistent slot_ids.
4f9b323 to
ac48184
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/redis_service.cpp (1)
2286-2315: Slot mapping correctness fixed; prefer named constants and (optional) no-16× expansion.
- Good: end_slot_range is initialized; sort then coalesce ranges.
- Readability: replace magic 16/1024 with constants.
- Perf: the previously suggested per-layer approach avoids building a 16×-larger vector. See earlier sketch; apply if needed.
- assert(txservice::total_range_buckets == 1024); - std::vector<uint32_t> slot_ids; - // slots count is 16 times of bucket count - slot_ids.reserve(bucket_ids.size() * 16); - for (uint16_t bucket_id : bucket_ids) - { - for (int i = 0; i < 16; i++) - { - slot_ids.push_back(bucket_id + i * 1024); - } - } + constexpr uint16_t kBucketStride = 1024; // anchored by assert + constexpr int kSlotsPerBucket = 16; + assert(txservice::total_range_buckets == kBucketStride); + std::vector<uint32_t> slot_ids; + slot_ids.reserve(bucket_ids.size() * kSlotsPerBucket); + for (uint16_t bucket_id : bucket_ids) { + for (int i = 0; i < kSlotsPerBucket; i++) { + slot_ids.push_back(static_cast<uint32_t>(bucket_id) + + i * kBucketStride); + } + }
🧹 Nitpick comments (5)
src/redis_command.cpp (1)
1678-1679: Hash semantics verified; consider factoring magic number across codebase.EloqKey::Hash() correctly implements CRC16-XMODEM with Redis hash tag support, properly handling
{…}patterns as per cluster spec. The mask0x3fffcorrectly produces slot range [0, 16383].Minor refactoring opportunity: The magic number
0x3fffappears in at least three locations (src/redis_command.cpp:1678, src/redis_service.cpp:2589, src/redis_service.cpp:2608). Consider extracting a shared constant to improve maintainability:static constexpr uint16_t kClusterSlotMask = 0x3fff;src/redis_service.cpp (4)
1076-1149: Bootstrap leader mapping and config derivation — verify intent; consider small refactor.
- Bootstrap maps every NG leader to this node (tmp_node_id). Confirm this is the intended “initialize all shards locally” behavior even if the node appears in only one NG in the config.
- Minor: the “load-or-parse ng_configs” code is duplicated here; extract a helper to keep the two call sites in sync.
1188-1192: StartService create-if-missing gating looks right.
Non-shared storage: always true; shared: (bootstrap || single-node). Consider adding an INFO log clarifying the flag used for easier ops debugging.- ret = data_store_service_->StartService(FLAGS_bootstrap || - is_single_node); + bool create_if_missing = (FLAGS_bootstrap || is_single_node); + DLOG(INFO) << "DataStoreService StartService(create_if_missing=" + << (create_if_missing ? "true" : "false") << ")"; + ret = data_store_service_->StartService(create_if_missing);
2502-2526: MOVED message: add defensive guard for empty hosts.
Rare, but if an ng had no online replicas between ClusterSlots and lookup, hosts.front() would UB. Guard and keep UNKNOWN fallback.- if (slot_info.start_slot_range <= slot_id && - slot_info.end_slot_range >= slot_id) - { - error_msg.append(" "); - error_msg.append(slot_info.hosts.front().ip); - error_msg.append(":"); - error_msg.append(std::to_string(slot_info.hosts.front().port)); - return error_msg; - } + if (slot_info.start_slot_range <= slot_id && + slot_info.end_slot_range >= slot_id) { + if (slot_info.hosts.empty()) { + LOG(WARNING) << "No hosts for slot range [" + << slot_info.start_slot_range << "-" + << slot_info.end_slot_range << "]"; + break; // fall through to UNKNOWN + } + error_msg.append(" "); + error_msg.append(slot_info.hosts.front().ip); + error_msg.append(":"); + error_msg.append(std::to_string(slot_info.hosts.front().port)); + return error_msg; + }
2588-2594: Deduplicate slot mask literal; use a named constexpr.
Predeclare kRedisSlotMask = 0x3fff and use it at both call sites.- uint16_t slot_id = key->Hash() & 0x3fff; + uint16_t slot_id = key->Hash() & kRedisSlotMask; @@ - uint16_t slot_id = keys->at(0).Hash() & 0x3fff; + uint16_t slot_id = keys->at(0).Hash() & kRedisSlotMask;Add once near file top (after SEC_LOCAL or another constants block):
// Redis slots mask (16384 slots) constexpr uint16_t kRedisSlotMask = 0x3fff;Also applies to: 2608-2613
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt(0 hunks)concourse/scripts/build_tarball.bash(1 hunks)concourse/scripts/common.sh(1 hunks)include/redis_service.h(1 hunks)src/redis_command.cpp(1 hunks)src/redis_service.cpp(8 hunks)store_handler(1 hunks)tests/unit/mono/expire.tcl(1 hunks)tx_service(1 hunks)
💤 Files with no reviewable changes (1)
- CMakeLists.txt
✅ Files skipped from review due to trivial changes (1)
- store_handler
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/unit/mono/expire.tcl
- concourse/scripts/common.sh
- tx_service
🧰 Additional context used
🧬 Code graph analysis (3)
include/redis_service.h (1)
src/redis_service.cpp (2)
GenerateMovedErrorMessage(2502-2526)GenerateMovedErrorMessage(2502-2502)
src/redis_command.cpp (1)
include/redis_command.h (1)
key_(3927-3929)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (5)
concourse/scripts/build_tarball.bash (1)
195-195: Partition flag removal aligns with PR objective.The DSS build now omits the partition flag, consistent with the removal of
USE_ONE_ELOQDSS_PARTITION_ENABLEDacross the codebase to support multi-shard sharding. The change is straightforward and correct.Please confirm that the DSS project's
CMakeLists.txtno longer defines or expects theUSE_ONE_ELOQDSS_PARTITION_ENABLEDoption, ensuring the CMake configuration aligns with this removal.include/redis_service.h (1)
242-242: Param rename aligns with slot_id terminology. LGTM.
Consistent with per-slot routing elsewhere.src/redis_service.cpp (3)
838-865: Bootstrap solo-node ng_configs path — confirm NodeConfig semantics.
Using a single NG (id 0) with one candidate node is fine if downstream code expects a candidate as leader. Please confirm NodeConfig(0, ip, tx_port, true) implies a candidate/primary suitable for immediate bootstrap.
1070-1073: SetThisNode after peer-config fetch — good reorder.
Ensures self is registered against fetched topology before service start.
1201-1205: Verify DataStoreServiceClient constructor parameter order and semantics against store_handler submodule.The store_handler dependency is not accessible in this environment. You must verify locally that the boolean arguments and parameter order at
src/redis_service.cpp:1201-1205match the constructor signature instore_handler/data_store_service_client.hto ensure:
FLAGS_bootstrap || is_single_nodemaps to the correct boolean parametereloq_dss_peer_node.empty()maps to the correct boolean parameter- All five parameters are in the intended order
ac48184 to
ce2d49a
Compare
b645b01 to
c46acb1
Compare
c46acb1 to
0ffac4a
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/redis_service.cpp (3)
2290-2319: Bucket→slot mapping logic is correct; consider guarding emptyng_leadersThe new code that:
- asserts
total_range_buckets == 1024,- maps each
bucket_idinto 16 slots viabucket_id + i * 1024(0 ≤ i < 16),- sorts
slot_ids, and- coalesces contiguous ranges into
SlotPair{start_slot_range,end_slot_range}correctly produces per‑slot ranges over 0–16383 and ensures
end_slot_rangeis always initialized.The only fragile part is
bucket_owners.size() / node_group_cntwhenng_leadersis empty (would be a divide‑by‑zero). Callers should always pass a non‑empty leader map, but for defensive robustness you might either early‑return whenng_leaders.empty()or add an explicitDCHECK(!ng_leaders.empty())before computingnode_group_cnt.
2506-2530:GenerateMovedErrorMessagenow operates on slot IDs; behavior matches slot rangesSwitching
GenerateMovedErrorMessageto takeslot_id, emit that in"MOVED <slot_id> host:port", and compare againstSlotInfo::start_slot_range/end_slot_rangealigns the MOVED reply with the new 0–16383 slot space returned byRedisClusterSlots. Semantics look correct.If MOVED responses ever become hot, you might consider caching the last
RedisClusterSlotsresult instead of recomputing it on every MOVED, but that’s an optional optimization.
2583-2621: MOVED slot computation uses 14‑bit slot IDs; ensure hashing stays consistentIn the DATA_NOT_ON_LOCAL / WRITE_REQUEST_ON_SLAVE path you now compute:
- For single‑key:
uint16_t slot_id = key->Hash() & 0x3fff;- For multi‑key:
uint16_t slot_id = keys->at(0).Hash() & 0x3fff;and feed that into
GenerateMovedErrorMessage(slot_id).Given
GetNodeSlotsInfomaps each bucket to 16 slots asbucket_id + i * 1024, this0x3fffmask is consistent with a 16384‑slot space wherebucket_id == slot_id % 1024. As long asTxKey/Sharderalso derive the routing bucket from the low 10 bits (or an equivalenttotal_range_buckets‑based function), MOVED will redirect clients to the same owner that actually serves the key.It may be worth factoring this slot calculation into a small helper (e.g.,
ComputeRedisSlotIdFromHash) shared withClusterKeySlotCommandto avoid magic constants and keep hashing consistent across call sites.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/redis_service.cpp(8 hunks)store_handler(1 hunks)tx_service(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tx_service
🚧 Files skipped from review as they are similar to previous changes (1)
- store_handler
🧰 Additional context used
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (2)
src/redis_service.cpp (2)
838-858: Bootstrap ng_configs path is straightforward and consistentThe bootstrap branch that seeds
ng_configswith a single local candidate node cleanly separates bootstrap from file/ParseNgConfigloading and is consistent with the later scan for(local_ip, local_tx_port)when derivingnode_id/native_ng_id. I don’t see correctness issues here.
1070-1148: DSS bootstrap/no‑peer wiring looks good; double‑check peer‑node flag polarityThe new logic for the ELOQDS path:
FetchConfigFromPeer+SetThisNode(TxPort2DssPort(local_tx_port))wheneloq_dss_peer_nodeis set, and- bootstrap/no‑peer flow that reloads
tmp_ng_configs, findstmp_node_id, and fillsng_leaderseither for bootstrap (this node as leader for all ngs) or single‑node,is coherent and matches the intended semantics where
ng_leadersis empty in non‑bootstrap multi‑node cases and handled insideTxConfigsToDssClusterConfig. TheStartService(true)for RocksDB andStartService(FLAGS_bootstrap || is_single_node)for other backends also matches the “create if missing” vs. “attach to existing” story.One thing to verify: in the
DataStoreServiceClientconstructor you passeloq_dss_peer_node.empty()as the boolean afterds_config. If that parameter is conceptually “peer node present” (as suggested by the summary), it may want!eloq_dss_peer_node.empty()instead. Please confirm the expected meaning and adjust if necessary.Also applies to: 1188-1192, 1201-1209
0ffac4a to
cb803dd
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/redis_service.cpp (2)
2290-2319: Bucket→slot expansion and slot range compaction are correct for 16×1024=16384 slotsThe new mapping from
bucket_idsto per-slot IDs:
- Uses
assert(txservice::total_range_buckets == 1024)andslot = bucket_id + i * 1024(i ∈ [0,15]), which correctly covers all 16384 Redis cluster slots without overlap.- Sorts
slot_idsand then builds contiguous[start_slot_range, end_slot_range]SlotPairs; the earlybucket_ids.empty()check and initialization ofend_slot_rangetoslot_ids.front()avoid the previous single-slot/uninitialized issues.- Is only on the CLUSTER NODES/SLOTS path, so the extra temporary allocation and sort are acceptable.
If you ever change
total_range_buckets, consider deriving the16multiplier from it (e.g., aconstexpr slots_per_bucketor a namedkRedisClusterSlots) to keep the relationship between buckets and slots self-documenting, but this is not required for correctness.
2506-2529: MOVED redirection now uses real cluster slot IDs consistent with CLUSTER SLOTSThe MOVED handling has been brought into line with Redis cluster semantics:
GenerateMovedErrorMessage(uint16_t slot_id)now works in terms of slot IDs and matches them againstSlotInfo.start_slot_range/end_slot_rangefromRedisClusterSlots, then returns"MOVED <slot_id> host:port".- Callers compute
slot_idasHash() & 0x3ffffor both single-key and multi-key (first-key) requests, which correctly maps into the 0–16383 slot space expected by clients and by the bucket→slot layout inGetNodeSlotsInfo.This should make MOVED responses fully consistent with what clients see from
CLUSTER SLOTSandCLUSTER NODES. If you want to avoid magic numbers, you could later introduce a named constant for the slot mask (e.g.,kRedisClusterSlotMask = 0x3fff) and reuse it here and in the command code, but the current form is functionally sound.Also applies to: 2593-2598, 2612-2617
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt(0 hunks)concourse/scripts/build_tarball.bash(1 hunks)concourse/scripts/common.sh(1 hunks)include/redis_service.h(1 hunks)src/redis_command.cpp(1 hunks)src/redis_service.cpp(8 hunks)store_handler(1 hunks)tests/unit/mono/expire.tcl(1 hunks)tx_service(1 hunks)
💤 Files with no reviewable changes (1)
- CMakeLists.txt
✅ Files skipped from review due to trivial changes (1)
- tx_service
🚧 Files skipped from review as they are similar to previous changes (6)
- concourse/scripts/common.sh
- include/redis_service.h
- store_handler
- tests/unit/mono/expire.tcl
- concourse/scripts/build_tarball.bash
- src/redis_command.cpp
🧰 Additional context used
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (2)
src/redis_service.cpp (2)
838-865: Bootstrap-only ng_configs initialization is consistent and self-containedConstructing
ng_configsas a single local node group whenFLAGS_bootstrapis true cleanly isolates bootstrap from the persisted cluster topology and aligns with the later uses ofnode_id,native_ng_id, andng_configsin this method. No functional issues spotted here.
1071-1148: ELOQDS bootstrap/multi-node wiring and create-if-missing flags look coherent; double-check NodeId expectationsThe DSS paths behave coherently:
- Peer mode:
FetchConfigFromPeerfollowed byds_config.SetThisNode(local_ip, TxPort2DssPort(local_tx_port))correctly informs DSS which physical node this process runs on.- Bootstrap without peer: rebuilding
tmp_ng_configs, locatingtmp_node_id, and populatingng_leadersfor allng_idbefore callingTxConfigsToDssClusterConfigmatches the intent of seeding DSS shards for the full cluster during bootstrap.- Non-bootstrap: passing an empty
ng_leadersmap for multi-node and a fully-populated one for single-node is consistent with the earlier discussion that multi-node startup should not pre-assign DSS leaders.- The
create_if_missingbehavior is now aligned betweenDataStoreService::StartService(...)andDataStoreServiceClient’s first boolean (truefor local RocksDB,FLAGS_bootstrap || is_single_nodeotherwise), which avoids subtle mismatches.One thing worth explicitly confirming in comments/design docs: in bootstrap mode,
tx_service_usesnode_idderived from the syntheticng_configs(always 0 here), while DSS bootstrap usestmp_node_idfromtmp_ng_configs. If these node IDs are intentionally independent identifiers across layers, this is fine; if not, consider either seedingng_configsfromtmp_ng_configsor adding a short comment clarifying the separation.Also applies to: 1188-1192, 1201-1209
cb803dd to
4a932e5
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/redis_command.cpp (1)
1675-1679: Cluster slot computation relies on hash semantics and a magic mask
result_ = key_.Hash() & 0x3fff;will give values in[0, 16383], which matches the 16384 Redis cluster slots ifEloqKey::Hash()is the canonical cluster hash. If any other code computesslot_iddifferently (e.g.,% 16384, or from a different hash function), MOVED errors / routing could diverge.Two small suggestions:
- Consider replacing
0x3fffwith a named constant (e.g.,kRedisClusterSlotsMaskor(kRedisClusterSlots - 1)) to make the 16384‑slot intent explicit.- Double‑check that all callers and sharding logic use the same
key_.Hash()+ 14‑bit mask convention forslot_id.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt(0 hunks)concourse/scripts/build_tarball.bash(1 hunks)concourse/scripts/common.sh(1 hunks)include/redis_service.h(1 hunks)src/redis_command.cpp(1 hunks)src/redis_service.cpp(8 hunks)store_handler(1 hunks)tests/unit/mono/expire.tcl(1 hunks)tx_service(1 hunks)
💤 Files with no reviewable changes (1)
- CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/unit/mono/expire.tcl
- concourse/scripts/common.sh
- store_handler
- tx_service
- include/redis_service.h
🧰 Additional context used
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (6)
concourse/scripts/build_tarball.bash (1)
195-195: Align with multi-shard DSS initialization: approved with verification request.The removal of
-DUSE_ONE_ELOQDSS_PARTITION_ENABLED=OFFis consistent with the PR objective to deprecate this build flag. The DSS build now relies on CMake default behavior for partition handling, which should support the multi-shard initialization mode.Please confirm that the default CMake value for the partition flag (no longer explicitly set to OFF) is appropriate for multi-shard DSS deployments. Additionally, verify that this change is coordinated with corresponding updates in
CMakeLists.txtmentioned in the enriched summary.src/redis_service.cpp (5)
838-865: Bootstrap ng_configs initialization matches single-node bootstrap semanticsUsing a solo
NodeConfigforng_configsunderFLAGS_bootstrapkeeps the bootstrap path independent of the persisted cluster config and still allowsnode_id/native_ng_idto be resolved correctly later. This aligns with the intent of “bootstrap only local txservice/logservice” and looks sound.
1060-1148: Confirm DSS bootstrap/no‑peer topology behaviorThe new DSS init logic looks reasonable overall (peer path uses
FetchConfigFromPeer+SetThisNode, non‑peer path usesTxConfigsToDssClusterConfigwith eithertmp_ng_configsfor bootstrap orng_configsplusng_leadersfor single/multi‑node).One design point worth double‑checking: in the bootstrap/no‑peer branch,
ng_leadersis populated withtmp_node_idfor alltmp_ng_configsentries, even for node groups where this node may not be a member. This assumesTxConfigsToDssClusterConfig(tmp_node_id, tmp_ng_configs, ng_leaders, ds_config)interpretsng_leaders[ng_id]purely as “which node should own DSS shards for ng_id during bootstrap”, not “leader must belong to that ng”. If that assumption ever changes, it might be safer to only assignng_leaders[ng_id] = tmp_node_idfor groups that actually contain this node.Please confirm that this matches the intended multi‑DSS‑shard bootstrap design and that
TxConfigsToDssClusterConfigis robust to this mapping.
1188-1192: StartService and DataStoreServiceClient flags are coherentThe updated calls
data_store_service_->StartService(true)for RocksDB (always create if missing on local non‑shared storage), anddata_store_service_->StartService(FLAGS_bootstrap || is_single_node)for other ELOQDS backends,together with the matching bootstrap/single‑node flag passed into
DataStoreServiceClient, line up with the intended behavior: only bootstrap or true single‑node deployments auto‑initialize storage locally, while multi‑node, non‑bootstrap nodes rely on existing topology or a peer.This looks internally consistent with the new API surface.
Also applies to: 1201-1209
2290-2319: Bucket→slot expansion and slot range construction look correctThe new logic that:
- asserts
txservice::total_range_buckets == 1024,- expands each bucket to 16 slots via
slot_id = bucket_id + i * 1024(yielding 16384 slots total),- sorts
slot_ids, and- builds contiguous
SlotPairranges while initializing bothstart_slot_rangeandend_slot_range(which also fixes the single‑slot case),matches the 16384‑slot Redis Cluster model and should interoperate cleanly with
RedisClusterNodes/RedisClusterSlotsconsumers.No functional issues spotted here.
2506-2529: MOVED error now keyed by full 14‑bit slot id – verify hash/slot consistencyUpdating
GenerateMovedErrorMessageto take aslot_idand computing it askey->Hash() & 0x3fff(0–16383) aligns MOVED replies with Redis’ 16384‑slot addressing and the new per‑slotSlotPairranges.The only assumption to confirm is that:
- the same underlying hash used by
Sharder/GetAllBucketOwnersto assign buckets corresponds to thisHash()value, and- the bucket→slot expansion in
GetNodeSlotsInfois consistent withslot_id = Hash() & 0x3fff(i.e., thatHash()exposes the full CRC16‑style hash, not an already‑reduced bucket index).If
Hash()was historically returning a reduced bucket id, this change would alter the advertised slot in MOVED errors without changing actual routing. Please verify thatTxKey::Hash()(and related code) now returns a value appropriate for 16384‑slot hashing so that MOVED slot ids and cluster slot maps stay in sync.Also applies to: 2593-2598, 2613-2617
4a932e5 to
1c03ea4
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/unit/mono/expire.tcl (1)
152-160: PTTL tolerance change is reasonable; consider tightening comment wording.Relaxing the upper bound to
<= 1100nicely accounts for cross-node clock skew between the SETEX node and the PTTL node while still catching gross errors. The explanatory comment is helpful; if you touch this again, you might slightly rephrase it (e.g., “SETEX computes the expire timestamp on the client’s node, while PTTL uses the key’s owner node; their clocks may differ”) but there’s no functional issue here.src/redis_service.cpp (2)
841-868: Bootstrap and ELOQDS multi‑shard wiring look consistent; consider documenting the emptyng_leaderscontract.
- The new FLAGS_bootstrap branch that seeds
ng_configswith a single local node (ng_id 0, node_id 0) is coherent with the later early‑exit bootstrap path and keeps bootstrap independent of any pre‑existing cluster_config file.- In the ELOQDS block:
- Peer mode (
eloq_dss_peer_nodenon‑empty) correctly pulls topology from the peer and then callsds_config.SetThisNode(local_ip, TxPort2DssPort(local_tx_port)), which is what you want for joining an existing DSS cluster.- The bootstrap/no‑peer path reloads full
tmp_ng_configs, finds the localtmp_node_id, marks it leader for allng_ids inng_leaders, and feeds that intoTxConfigsToDssClusterConfig. That matches the goal of “bootstrap all DSS shards on this node.”- The non‑bootstrap path only populates
ng_leadersforis_single_node, leaving it empty for multi‑node, which aligns with your earlier note that multi‑node non‑bootstrap should pass an empty map and let DSS resolve shard leadership itself.- The booleans for
StartService(true)vsStartService(FLAGS_bootstrap || is_single_node)and the first parameter toDataStoreServiceClientare internally consistent and reflect the shared‑vs‑non‑shared storage distinction.I’d only suggest adding a short comment where
ng_leadersis intentionally left empty in the multi‑node non‑bootstrap case to encode that contract for future readers; behavior otherwise looks correct.Also applies to: 1081-1159, 1202-1205, 1215-1223
2304-2333: Slot mapping and MOVED error generation are consistent with 16K‑slot Redis semantics.
GetNodeSlotsInfonow maps each bucket to 16 slots (bucket_id + i*1024withtotal_range_buckets == 1024), sorts the resultingslot_ids, and collapses contiguous sequences intoSlotPairranges. That yields a clean 0–16383 slot space and supports multi‑shard leaders (one node leading multiple ngs) correctly.- MOVED handling uses
slot_id = Hash() & 0x3ffffor both single‑key and multi‑key requests, andGenerateMovedErrorMessage(uint16_t slot_id)searches theRedisClusterSlotsoutput for a range containing that slot. Given CLUSTER KEYSLOT is using the sameHash() & 0x3fffconvention, this keeps MOVED, CLUSTER SLOTS, and CLUSTER KEYSLOT aligned.- Using
uint16_tforslot_idis safe here since the mask restricts values to[0, 16383].If you want extra observability, you could log the
slot_idin the “slot not found” error path, but functionally the changes look solid.Also applies to: 2520-2531, 2607-2631
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt(0 hunks)concourse/scripts/build_tarball.bash(1 hunks)concourse/scripts/common.sh(1 hunks)include/redis_service.h(1 hunks)src/redis_command.cpp(1 hunks)src/redis_service.cpp(8 hunks)store_handler(1 hunks)tests/unit/mono/expire.tcl(1 hunks)tx_service(1 hunks)
💤 Files with no reviewable changes (1)
- CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (4)
- include/redis_service.h
- concourse/scripts/build_tarball.bash
- tx_service
- src/redis_command.cpp
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-26T05:07:45.185Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: concourse/scripts/build_tarball_open.bash:121-121
Timestamp: 2025-09-26T05:07:45.185Z
Learning: The open log service in build_tarball_open.bash only supports ROCKSDB as the single log state option, unlike the main eloq log service which supports multiple WITH_LOG_STATE options. Therefore, no WITH_LOG_STATE configuration flag is needed for the open log service build.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T09:00:44.250Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:103-118
Timestamp: 2025-09-24T09:00:44.250Z
Learning: The build_log_service.cmake file is specifically for building the log_service submodule which only supports ROCKSDB log state, unlike build_eloq_log_service.cmake which handles multiple WITH_LOG_STATE options (MEMORY, ROCKSDB, ROCKSDB_CLOUD_S3, ROCKSDB_CLOUD_GCS). Therefore, requiring RocksDB unconditionally in build_log_service.cmake is the correct behavior.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:45:08.818Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: build_log_service.cmake:128-135
Timestamp: 2025-09-24T08:45:08.818Z
Learning: The LOG_STATE_TYPE_RKDB compile definition in build_log_service.cmake is kept unconditional for compatibility reasons, despite architectural concerns about conflicting with WITH_LOG_STATE paradigm.
Applied to files:
concourse/scripts/common.sh
📚 Learning: 2025-09-24T08:53:57.928Z
Learnt from: githubzilla
Repo: eloqdata/eloqkv PR: 193
File: src/redis_service.cpp:94-118
Timestamp: 2025-09-24T08:53:57.928Z
Learning: The LOG_STATE_TYPE_RKDB_CLOUD macro gating in src/redis_service.cpp includes `&& !defined(LOG_STATE_TYPE_RKDB)` condition intentionally to prevent conflicts when LOG_STATE_TYPE_RKDB is defined unconditionally in some builds (build_log_service.cmake) alongside cloud variants. The CMake configuration in build_eloq_log_service.cmake ensures mutual exclusivity of log state types.
Applied to files:
concourse/scripts/common.sh
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (3)
concourse/scripts/common.sh (3)
142-154: Removal of USE_ONE_ELOQDSS_PARTITION_ENABLED flag is correctly applied.The cmake invocation in
run_buildnow removes the-DUSE_ONE_ELOQDSS_PARTITION_ENABLED=ONflag as part of the multi-shard support changes. The-DWITH_LOG_SERVICE=ONflag is correctly retained. No concerns with this change.
203-217: Enterprise build function correctly mirrors flag removal.The
run_build_entfunction similarly removes the-DUSE_ONE_ELOQDSS_PARTITION_ENABLED=ONflag while preserving the enterprise-specific configuration flags (OPEN_LOG_SERVICE=OFF,FORK_HM_PROCESS=ON) and-DWITH_LOG_SERVICE=ON.
2970-2980: Clarify the re-enablement of previously disabled tests.Lines 2970 and 2979 now execute
branching_snapshot_testandrocksdb_cloud_data_store_purger_testfor theELOQDSS_ROCKSDB_CLOUD_S3storage type. Previous reviews flagged these as disabled. Please add a brief comment explaining:
- Why these tests were originally disabled (e.g., multi-shard incompatibility).
- What changed to make them compatible again (e.g., specific fixes or test updates in this PR).
If intentional, this is good; if accidental, these lines may need to remain commented. A clarifying comment will help future maintainers understand the decision.
store_handler
Outdated
| @@ -1 +1 @@ | |||
| Subproject commit 87ee52bf0672fee7c8f4b190c93b99cdfcb1bb37 | |||
| Subproject commit dab3aa8130da6cd414ed6623c24da42e28aabb10 | |||
There was a problem hiding this comment.
🧩 Analysis chain
Verify the submodule update aligns with multi-shard DSS changes.
The submodule pointer has been updated. Ensure that the changes in the store_handler submodule are compatible with the multi-shard DSS support and related Redis service layer modifications in this PR.
Run the following script to verify the submodule update:
🏁 Script executed:
#!/bin/bash
# Description: Verify the store_handler submodule commit exists and review the changes
# Check if the new commit exists in the submodule
cd store_handler 2>/dev/null || {
echo "Error: store_handler directory not found"
exit 1
}
# Verify the new commit exists
NEW_COMMIT="dab3aa8130da6cd414ed6623c24da42e28aabb10"
if git cat-file -e "$NEW_COMMIT^{commit}" 2>/dev/null; then
echo "✓ New commit $NEW_COMMIT exists"
else
echo "✗ New commit $NEW_COMMIT not found in store_handler"
exit 1
fi
# Show the commit log between old and new commits
OLD_COMMIT="87ee52bf0672fee7c8f4b190c93b99cdfcb1bb37"
echo ""
echo "Changes in store_handler submodule:"
git log --oneline "$OLD_COMMIT..$NEW_COMMIT" 2>/dev/null || {
echo "Unable to compare commits. Ensure the submodule is initialized and up-to-date."
}
# Show files changed in the submodule
echo ""
echo "Files changed in store_handler:"
git diff --name-status "$OLD_COMMIT" "$NEW_COMMIT" 2>/dev/nullLength of output: 80
🏁 Script executed:
#!/bin/bash
# Description: Investigate store_handler submodule status and commit validity
# Check submodule status and configuration
echo "=== Submodule Configuration ==="
git config -f .gitmodules --get submodule.store_handler.path
git config -f .gitmodules --get submodule.store_handler.url
echo ""
echo "=== Submodule Status ==="
git submodule status store_handler 2>/dev/null || echo "Submodule status unavailable"
echo ""
echo "=== Checking Remote Repository ==="
REMOTE_URL=$(git config -f .gitmodules --get submodule.store_handler.url)
if [ -n "$REMOTE_URL" ]; then
echo "Remote URL: $REMOTE_URL"
NEW_COMMIT="dab3aa8130da6cd414ed6623c24da42e28aabb10"
echo "Checking if commit exists in remote:"
git ls-remote "$REMOTE_URL" "$NEW_COMMIT" 2>/dev/null || {
echo "Unable to query remote. Checking if commit appears in refs:"
git ls-remote "$REMOTE_URL" | grep "$NEW_COMMIT" || echo "Commit not found in remote refs"
}
else
echo "Unable to determine remote URL"
fi
echo ""
echo "=== Current store_handler state (if initialized) ==="
if [ -d store_handler/.git ]; then
cd store_handler
echo "Current HEAD:"
git rev-parse HEAD 2>/dev/null || echo "Unable to determine HEAD"
echo "Remotes:"
git remote -v 2>/dev/null || echo "No remotes configured"
cd ..
else
echo "Submodule not initialized locally"
fiLength of output: 392
Submodule update points to non-existent commit - cannot proceed.
The store_handler submodule is being updated to commit dab3aa8130da6cd414ed6623c24da42e28aabb10, but this commit does not exist in the remote repository (https://github.com/eloqdata/store_handler.git). This would cause repository checkout and CI/CD failures.
Verify that:
- The commit hash is correct
- The changes have been pushed to the store_handler repository
- The submodule pointer is updated to a valid, existing commit before merging
🤖 Prompt for AI Agents
In store_handler around lines 1 to 1, the submodule pointer references commit
dab3aa8130da6cd414ed6623c24da42e28aabb10 which does not exist in
https://github.com/eloqdata/store_handler.git; to fix, verify the correct commit
hash in that remote (or push the missing commit), update the submodule pointer
to a valid existing commit (git submodule sync; cd store_handler; git fetch &&
git checkout <valid-commit>), commit the updated .gitmodules/.gitlinks in the
parent repo, and re-run git submodule update --init --recursive to ensure CI can
checkout the submodule.
Update bucket map to key-slot for total_bucket_count changed to 1024 Remove compile option USE_ONE_ELOQDSS_PARTITION_ENABLED Only use one node group to start logservice and txservice on bootstrap Update ClusterKeySlot command Dss client add arg bind_data_shard_with_ng Update test case expire.tcl
1c03ea4 to
6c87cd4
Compare
6c87cd4 to
e89f616
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/redis_command.cpp (1)
1676-1680: Cluster keyslot now maps directly to 16K slots; consider de‑duplicating the magic constant
result_ = key_.Hash() & 0x3fff;correctly constrains the slot to[0, 16383], which matches the hard‑coded16384slot count reported inClusterInfoCommand. To keep things consistent and self‑documenting, consider introducing a sharedconstexpr(e.g.kRedisClusterSlots = 16384andkRedisClusterSlotMask = kRedisClusterSlots - 1) and using it both here and in the cluster info strings instead of repeating0x3fff/16384.src/redis_service.cpp (1)
2304-2333: Per-slot mapping from buckets looks logically correct; consider minor robustness and perf tweaksThe new mapping:
- Asserts
txservice::total_range_buckets == 1024,- Expands each bucket id
binto 16 slotsb + i * 1024(0 ≤ i < 16),- Sorts
slot_idsand compresses them into contiguous[start_slot_range,end_slot_range]pairs,is consistent with a 16384-slot Redis cluster and should cover each slot exactly once per owning node, with correct handling of the single-slot case via initialization of both
start_slot_rangeandend_slot_range.Two optional improvements you may want to consider:
- Robustness to future callers: if
ng_leaderscould ever be empty outside the current call sites, an earlyif (ng_leaders.empty()) return;would guard thebucket_owners.size() / node_group_cntreserve against a regression back to division-by-zero if this function gets reused.- Performance: instead of the nested
for (ng_leader) { for (bucket_owners) { ... } }, you can do a single pass overbucket_owners, look up the leader viang_leaders.at(pair.second), and append the bucket to that leader’s vector. That reduces this block from O(|ng_leaders| * |bucket_owners|) to O(|bucket_owners|) with simpler control flow.Functionally it looks fine as-is; these would just make it more future-proof and efficient.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt(0 hunks)concourse/scripts/build_tarball.bash(1 hunks)concourse/scripts/common.sh(1 hunks)include/redis_service.h(1 hunks)src/redis_command.cpp(1 hunks)src/redis_service.cpp(8 hunks)store_handler(1 hunks)tests/unit/mono/expire.tcl(1 hunks)tx_service(1 hunks)
💤 Files with no reviewable changes (1)
- CMakeLists.txt
✅ Files skipped from review due to trivial changes (1)
- tx_service
🚧 Files skipped from review as they are similar to previous changes (5)
- concourse/scripts/build_tarball.bash
- tests/unit/mono/expire.tcl
- concourse/scripts/common.sh
- store_handler
- include/redis_service.h
🧰 Additional context used
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (3)
src/redis_service.cpp (3)
841-868: Bootstrap-only ng_configs initialization is consistent with single-node bootstrap semanticsSpecial-casing
FLAGS_bootstrapto synthesize a single-nodeng_configsentry for the local node, while keeping the existing config-file /ParseNgConfigpath for non-bootstrap, looks coherent and keeps the rest of the initialization logic unchanged. No issues from my side here.
1081-1159: DSS bootstrap/topology wiring and create_if_missing flags look correct, but double-check intended leadership mappingThe new logic to:
- Use
SetThisNode()when bootstrapping from a peer, and- In the no-peer case, build
ds_configviaTxConfigsToDssClusterConfigwith:
tmp_ng_configs+ng_leaders[ng_id] = tmp_node_idduringFLAGS_bootstrap, andng_leaderspopulated only foris_single_node(left empty for multi-node),plus threading
FLAGS_bootstrap || is_single_nodeconsistently intoStartService()and theDataStoreServiceClientctor, matches the multi-shard design and avoids creating data on non-leaders in regular multi-node startup.One thing worth a quick sanity check: during bootstrap,
ng_leadersis filled with the sametmp_node_idfor everyng_id. IfTxConfigsToDssClusterConfigassumes thatng_leaders[ng_id]must be a node that actually belongs to thatng_idintmp_ng_configs, please confirm this “local node leads all ngs for bootstrap” mapping is explicitly supported there.Also applies to: 1198-1206, 1215-1223
2520-2531: Slot-based MOVED error generation aligns with 16384 slots; confirm Hash() consistency with shardingRenaming
GenerateMovedErrorMessageto take aslot_idand computing that askey->Hash() & 0x3fff(for both single-key and multi-key requests) lines up with the Redis 0–16383 slot space and with the per-slot ranges produced byRedisClusterSlots/GetNodeSlotsInfo. The range check inGenerateMovedErrorMessageoverslot_info.start_slot_range/end_slot_rangeshould therefore send clients to the correct master.To be completely sure there’s no subtle mismatch, please confirm that:
- The same
Hash()value (or at least its low bits) is what Sharder / bucket assignment uses, so thatslot_id % 1024matches the bucket ID used when buildingbucket_ownersinGetNodeSlotsInfo.If that holds, this MOVED-path change looks good.
Also applies to: 2607-2631
Summary by CodeRabbit
New Features
Bug Fixes
Chores
API
✏️ Tip: You can customize this high-level summary in your review settings.